1 Introduction

In this part of my project I will refine my research questions. I will further examine the effects of the pandemic on recent MCPS highschool graduates enrolled at Montgomery College. For the purposes of this study I will limit my dataset to MCPS students under the age of 20. These MCPS students will be divided further into subgroups based on Gender and Race. The datasets used in this part of my project have already been cleaned in my initial data analysis.

2 Data Dictionary

For the purposes of this Project the following variables and definitions are important.

Terminology:
Fall2019 refers to the incoming freshman cohort in Fall2019. This is term year 2020.
Fall2020 refers to the incoming freshman cohort in Fall2020. This is term year 2021.

Variables of Interest:
term year: incoming students in Fall 2019 are assigned to term year 2020. Incoming students in Fall 2020 are assigned to term year 2021.
hours_earned: refers to credit hours the student has earned in their first Fall semester. This can include credits earned in Summer school second session (Summer 2) and AP credits earned in high school.
hours_attempted: refers to credit and non credit hours the student has attempted in their first Fall semester. This may include credits attempted in Summerschool second session - Summer 2.
full_part: is the student full-time (FT) or part-time (PT). This classification is based on the students self reported information in the admissions application. Students are classified as full-time if they intend to take at least 12 credits.
major: degree programme student is registered for or certificate&LR ( letter of recommendation.) All certificates and letters of recommendations have been grouped together.
hours_earned_rate: Ratio of hours_earned/hours_attempted
age: age of student at the start of program.
race: racial classification of student. This is based on the IPEDS system. Foreign students are identified as foreign and not by their race/ethnicity.
sex: gender classification of student.
high_school: name of highschool student graduted from. Public High schools in Montgomery county are classified as MCPS.
pell: Whether the student received a pell grant or not.

3 Data Wrangling

3.1 Import Data

Summary of Data and Types

skim(df_Degrees)
Data summary
Name df_Degrees
Number of rows 7123
Number of columns 23
_______________________
Column type frequency:
character 15
logical 1
numeric 7
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
sex 0 1.00 1 1 0 4 0
race 0 1.00 5 22 0 9 0
age 0 1.00 4 7 0 5 0
high_school 0 1.00 7 30 0 163 0
full_part 0 1.00 2 2 0 2 0
city 19 1.00 5 19 0 127 0
stat_code 19 1.00 2 2 0 16 0
pell_grant 0 1.00 1 1 0 2 0
camp_code 140 0.98 1 1 0 6 0
major 0 1.00 1 61 0 34 0
pass_engl 0 1.00 1 1 0 2 0
pass_math 0 1.00 1 1 0 2 0
summer2 0 1.00 1 1 0 1 0
fall 0 1.00 1 1 0 1 0
HS_classify 0 1.00 2 14 0 7 0

Variable type: logical

skim_variable n_missing complete_rate mean count
MCPS 0 1 0.7 TRU: 4963, FAL: 2160

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
u_number 0 1 20196625.60 5027.06 20190001 20191872.50 20193733.00 20201703.5 20203588.0 ▇▃▁▂▇
zip 19 1 20886.64 1559.40 1460 20853.00 20877.00 20903.0 94025.0 ▁▇▁▁▁
hours_attempted 0 1 12.46 6.23 1 9.00 12.00 15.0 54.0 ▆▇▁▁▁
hours_earned 0 1 7.85 7.43 0 3.00 6.00 12.0 54.0 ▇▃▁▁▁
mc_gpa 0 1 2.19 1.47 0 0.67 2.50 3.5 4.0 ▆▂▃▅▇
term_year 0 1 2020.47 0.50 2020 2020.00 2020.00 2021.0 2021.0 ▇▁▁▁▇
hours_earned_rate 0 1 0.57 0.38 0 0.23 0.64 1.0 3.2 ▇▇▁▁▁

Change Datatypes

df_Degrees$u_number<- as.character(df_Degrees$u_number)
df_Degrees$term_year<- as.character(df_Degrees$term_year)

3.2 Create DataFrame of students who graduated MCPS high schools who are 20yrs and under .

Use the dataframe df_Degrees which has been cleaned in the initial data analysis. Filter all MCPS students who are 20yrs and younger in age.

df_MCPS20D<-df_Degrees %>%                    
         filter(HS_classify=="MCPS")%>%    # filter degrees dataset to obtain students who graduated MCPS highschools
         filter(age=='18 - 20' | age =="< 18") # filter students who are 20yrs old and younger. 

4 Demographics of Students who graduated from MCPS highschools and are 20yrs and younger.

4.1 Full time versus Part-time Degree Students

Frequency of Students Part time versus Full tim: 2020 vs 2021

# Number of students part time abnd full time  2020 vs 2021
ggplot(data=df_MCPS20D, aes(x=full_part, fill=full_part)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=2,size=3)+
      facet_wrap(~term_year)+
      ggtitle("Number of Students Full time versus Part time")+
      ylab('Frequency')+
      xlab("")+
      theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())

Proportion of Students Full time versus Part time: 2020 vs 2021

df_MCPS20D %>% 
    group_by(term_year) %>% 
    count(full_part) %>% 
    mutate(prop = n/sum(n)) %>% 
    ggplot(aes(x = full_part, y = prop)) +
    geom_col(aes(fill = full_part), position = "dodge") +
    geom_text(aes(label = scales::percent(prop), 
                  y = prop, 
                  group = full_part),
              position = position_dodge(width = 0.9),
              vjust = 2,size=3)+
   facet_wrap(~term_year)+
      ggtitle("Proportion of Students Full time versus Part time")+
      ylab('Percentage')+
      xlab("")+
      theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())

# change in overall MCPS student population from 2020 to 2021

df_MCPS20D%>%
          group_by(term_year,full_part)%>%
          count(full_part)%>%
          group_by(term_year)%>%
          mutate(total_pop =sum(n))%>%
          group_by(full_part)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100) 
## # A tibble: 4 x 5
## # Groups:   full_part [2]
##   term_year full_part     n total_pop pct_change
##   <chr>     <chr>     <int>     <int>      <dbl>
## 1 2020      FT         1655      2456      NA   
## 2 2021      FT         1556      2303      -5.98
## 3 2020      PT          801      2456      NA   
## 4 2021      PT          747      2303      -6.74

There was a 5.98% decrease in full time students who graduated from MCPS highschools in term year 2021. There was a -6.74% decrease in part time students who graduated from MCPS.

4.2 Race

Count of Race Groups

ggplot(data=df_MCPS20D, aes(x=race, fill=race)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=0,size=3)+
      facet_wrap(~term_year + full_part)+
      theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())+
      ggtitle("Number of Students per a Race Group")+
      xlab("Race")+
      ylab("Frequency")

Full time student: Change in enrollment from 2020 to 2021 based on Race

# calculate percentage change in full time student enrollment from 2020 to 2021 by  race

df_MCPS20D%>%
          filter(full_part=="FT")%>%
          group_by(term_year,race)%>%
          count(race)%>%
          group_by(race)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100) 
## # A tibble: 18 x 4
## # Groups:   race [9]
##    term_year race                       n pct_change
##    <chr>     <chr>                  <int>      <dbl>
##  1 2020      Am. Indian / AK Native     5      NA   
##  2 2021      Am. Indian / AK Native     1     -80   
##  3 2020      Asian                    272      NA   
##  4 2021      Asian                    227     -16.5 
##  5 2020      Black / African Am.      389      NA   
##  6 2021      Black / African Am.      326     -16.2 
##  7 2020      Foreign                  103      NA   
##  8 2021      Foreign                   96      -6.80
##  9 2020      Hawaiian / Pac. Isl.       5      NA   
## 10 2021      Hawaiian / Pac. Isl.       3     -40   
## 11 2020      Hispanic                 534      NA   
## 12 2021      Hispanic                 596      11.6 
## 13 2020      Multi-Race                71      NA   
## 14 2021      Multi-Race                63     -11.3 
## 15 2020      Unknown                   11      NA   
## 16 2021      Unknown                    3     -72.7 
## 17 2020      White                    265      NA   
## 18 2021      White                    241      -9.06

Full time students: There was a 16.5% decline in asian students, 16.1% decline in African American students, a 9.1% decline in white students and 6.8% decline in foreign students. Hispanic students increased by 11.6%.

Part time student: Change in enrollment from 2020 to 2021 based on Race

# calculate percentage change in full time student enrollment from 2020 to 2021 by  race

df_MCPS20D%>%
          filter(full_part=="PT")%>%
          group_by(term_year,race)%>%
          count(race)%>%
          group_by(race)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100) 
## # A tibble: 18 x 4
## # Groups:   race [9]
##    term_year race                       n pct_change
##    <chr>     <chr>                  <int>      <dbl>
##  1 2020      Am. Indian / AK Native     4      NA   
##  2 2021      Am. Indian / AK Native     1     -75   
##  3 2020      Asian                     69      NA   
##  4 2021      Asian                     63      -8.70
##  5 2020      Black / African Am.      177      NA   
##  6 2021      Black / African Am.      181       2.26
##  7 2020      Foreign                   73      NA   
##  8 2021      Foreign                   54     -26.0 
##  9 2020      Hawaiian / Pac. Isl.       1      NA   
## 10 2021      Hawaiian / Pac. Isl.       1       0   
## 11 2020      Hispanic                 327      NA   
## 12 2021      Hispanic                 263     -19.6 
## 13 2020      Multi-Race                33      NA   
## 14 2021      Multi-Race                35       6.06
## 15 2020      Unknown                    5      NA   
## 16 2021      Unknown                    2     -60   
## 17 2020      White                    112      NA   
## 18 2021      White                    147      31.2

Part time students: There was an 8.7% decrease in Asian students, a 26% decrease in foreign students, 2.3% increase in african american students and a 19.6% decrease in hispanic students. There was a 31.25% increase in white students.

4.3 Gender

Gender of Students

# Gender of students part time and full time  2020 vs 2021
ggplot(data=df_MCPS20D, aes(x=sex, fill=sex)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=1,size=3)+
      facet_wrap(~term_year+full_part)+
      ggtitle("Gender of Students: Full time versus Part time")+
      ylab('Frequency')+
      xlab("")+
      theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())

Calculate percentage change in full time student enrollment from 2020 to 2021 by gender

# calculate percentage change in full time student enrollment from 2020 to 2021 by  gender

df_MCPS20D%>%
          filter(full_part=="FT")%>%
          filter(sex=="F"|sex =="M")%>%
          group_by(term_year,sex)%>%
          count(sex)%>%
          group_by(sex)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100) 
## # A tibble: 4 x 4
## # Groups:   sex [2]
##   term_year sex       n pct_change
##   <chr>     <chr> <int>      <dbl>
## 1 2020      F       793      NA   
## 2 2021      F       819       3.28
## 3 2020      M       842      NA   
## 4 2021      M       719     -14.6

Full time students: 14% decrease in attendance by male students. A 3.27% decrease in female students.

Calculate percentage change in part time student enrollment from 2020 to 2021 by gender

# calculate percentage change in part time student enrollment from 2020 to 2021 by  gender

df_MCPS20D%>%
          filter(full_part=="PT")%>%
          filter(sex=="F"|sex =="M")%>%
          group_by(term_year,sex)%>%
          count(sex)%>%
          group_by(sex)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100) 
## # A tibble: 4 x 4
## # Groups:   sex [2]
##   term_year sex       n pct_change
##   <chr>     <chr> <int>      <dbl>
## 1 2020      F       381      NA   
## 2 2021      F       345      -9.45
## 3 2020      M       401      NA   
## 4 2021      M       395      -1.50

Part time: 9.5% decrease in female students. 1.5% decrease in male students.

Gender and Race breakdown of full time students

# Gender and Race of full time students  2020 vs 2021

df_MCPS20D%>%
      filter(sex %in% c("F","M"))%>%
      filter(full_part=="FT")%>%
      ggplot(., aes(x=race, fill=race)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=0, size=3)+
      facet_wrap(~term_year+sex)+
      ggtitle("Gender and Race of Full time Students")+
      ylab('Frequency')+
      xlab("")+
      theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())

#    theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())

Full time Student Enrollment Percentages trend by Gender and race

# calculate percentage change in student enrollment from 2020 to 2021 by race and gender

# create data frames with counts of full time students by race and gender
df_MCPS20D%>%
          filter(full_part=="FT")%>%
          filter(sex=="F"|sex =="M")%>%
          group_by(term_year,race,sex)%>%
          count(sex)%>%
          group_by(race,sex)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100) 
## # A tibble: 35 x 5
## # Groups:   race, sex [18]
##    term_year race                   sex       n pct_change
##    <chr>     <chr>                  <chr> <int>      <dbl>
##  1 2020      Am. Indian / AK Native F         4      NA   
##  2 2020      Am. Indian / AK Native M         1      NA   
##  3 2021      Am. Indian / AK Native M         1       0   
##  4 2020      Asian                  F       111      NA   
##  5 2021      Asian                  F       115       3.60
##  6 2020      Asian                  M       159      NA   
##  7 2021      Asian                  M       110     -30.8 
##  8 2020      Black / African Am.    F       178      NA   
##  9 2021      Black / African Am.    F       169      -5.06
## 10 2020      Black / African Am.    M       202      NA   
## # … with 25 more rows

Part time Student Enrollment Percentages trend by Gender and race

# calculate percentage change in student enrollment from 2020 to 2021 by race and gender

# create data frames with counts of full time students by race and gender
df_MCPS20D%>%
          filter(full_part=="PT")%>%
          filter(sex=="F"|sex =="M")%>%
          group_by(term_year,race,sex)%>%
          count(sex)%>%
          group_by(race,sex)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100) 
## # A tibble: 31 x 5
## # Groups:   race, sex [17]
##    term_year race                   sex       n pct_change
##    <chr>     <chr>                  <chr> <int>      <dbl>
##  1 2020      Am. Indian / AK Native M         4      NA   
##  2 2021      Am. Indian / AK Native M         1     -75   
##  3 2020      Asian                  F        30      NA   
##  4 2021      Asian                  F        19     -36.7 
##  5 2020      Asian                  M        37      NA   
##  6 2021      Asian                  M        44      18.9 
##  7 2020      Black / African Am.    F        79      NA   
##  8 2021      Black / African Am.    F        84       6.33
##  9 2020      Black / African Am.    M        96      NA   
## 10 2021      Black / African Am.    M        94      -2.08
## # … with 21 more rows

4.4 Pell Grant

Need to correct file

# Pell Grant
ggplot(data=df_MCPS20D, aes(x=pell_grant, fill=pell_grant)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=3, size=3)+
      facet_wrap(~term_year+full_part)+
      ggtitle("Pell grant")+
      ylab('Frequency')+
      xlab("")+
      theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())

Percentage of each race group in Student Population

df_MCPS20D %>% 
    group_by(term_year,full_part) %>% 
    count(pell_grant) %>% 
    mutate(prop = n/sum(n)) %>% 
    ggplot(aes(x = pell_grant, y = prop)) +
    geom_col(aes(fill = pell_grant), position = "dodge") +
    geom_text(aes(label = scales::percent(prop,0.1), 
                  y = prop, 
                  group = pell_grant),
              position = position_dodge(width = 0.9),
              vjust = 0,size=3)+
    facet_wrap(~term_year + full_part)+
      ggtitle("Proportion of Students receiving Pell Grants")+
      ylab('Proportion ')+
      xlab("")+
      theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())

4.5 Majors

Overall Majors trend

Count of Majors in Full time students in 2020

z1<- df_MCPS20D%>%
      filter(full_part=="FT" &term_year =="2020")%>%
       ggplot(., aes(x=major, fill=major)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=0, hjust=0, size =3)+
      ggtitle("Majors of Full-time Students in 2020  ")+
      xlab("Major")+
      ylab("Frequency")+
    theme(legend.position = "none") 
       
z1 + coord_flip()

Count of Majors in Full time students in 2021

z13<- df_MCPS20D%>%
      filter(full_part=="FT" &term_year =="2021")%>%
       ggplot(., aes(x=major, fill=major)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=0, hjust=0, size =3)+
      ggtitle("Majors of Full-time Students in 2021  ")+
      xlab("Major")+
      ylab("Frequency")+
    theme(legend.position = "none") 
       
z13 + coord_flip()

calculate percentage change in full time student majors from 2020 to 2021

df_MCPS20D%>%
          filter(full_part=="FT")%>%
          group_by(term_year,major)%>%
          count(major)%>%
          group_by(term_year)%>%
          group_by(major)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 62 x 4
## # Groups:   major [33]
##    term_year major                        n pct_change
##    <chr>     <chr>                    <int>      <dbl>
##  1 2020      0                            3      NA   
##  2 2021      0                            2     -33.3 
##  3 2020      American Sign Language       5      NA   
##  4 2021      American Sign Language       1     -80   
##  5 2020      Applied Geography            1      NA   
##  6 2021      Applied Geography            2     100   
##  7 2020      Architectural Technology    15      NA   
##  8 2021      Architectural Technology    19      26.7 
##  9 2020      Art                         24      NA   
## 10 2021      Art                         22      -8.33
## # … with 52 more rows

Count of Majors in Part time students in 2020

z11<- df_MCPS20D%>%
      filter(full_part=="PT" &term_year =="2020")%>%
       ggplot(., aes(x=major, fill=major)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=0, hjust=0, size =3)+
      ggtitle("Majors of Part-time Students in 2020  ")+
      xlab("Major")+
      ylab("Frequency")+
    theme(legend.position = "none") 
       
z11 + coord_flip()

Count of Majors in Part time students in 2021

z12<- df_MCPS20D%>%
      filter(full_part=="PT" &term_year =="2021")%>%
       ggplot(., aes(x=major, fill=major)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=0, hjust=0, size =3)+
      ggtitle("Majors of Part-time Students in 2021  ")+
      xlab("Major")+
      ylab("Frequency")+
    theme(legend.position = "none") 
       
z12 + coord_flip()

calculate percentage change in part time student majors from 2020 to 2021

df_MCPS20D%>%
          filter(full_part=="PT")%>%
          group_by(term_year,major)%>%
          count(major)%>%
          group_by(term_year)%>%
          group_by(major)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 60 x 4
## # Groups:   major [32]
##    term_year major                        n pct_change
##    <chr>     <chr>                    <int>      <dbl>
##  1 2020      0                            5       NA  
##  2 2020      American Sign Language       1       NA  
##  3 2021      American Sign Language       2      100  
##  4 2020      Applied Geography            2       NA  
##  5 2020      Architectural Technology    13       NA  
##  6 2021      Architectural Technology     4      -69.2
##  7 2020      Art                         12       NA  
##  8 2021      Art                         14       16.7
##  9 2020      Broadcast Media              5       NA  
## 10 2021      Broadcast Media              4      -20  
## # … with 50 more rows

4.6 High Schools

4.6.1 Full time Student

Breakdown of Highschools Full time students in term year 2020 attended in MCPS

df_MCPS20D%>%
          filter(full_part=="FT" & term_year=="2020")%>%
          group_by(term_year,high_school)%>%
          count(high_school)%>%
          group_by(term_year)%>%
          mutate(total_pop =sum(n))%>%
          group_by(high_school)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_pop= (n/total_pop*100))%>%
          arrange(desc(pct_pop))
## # A tibble: 25 x 5
## # Groups:   high_school [25]
##    term_year high_school                        n total_pop pct_pop
##    <chr>     <chr>                          <int>     <int>   <dbl>
##  1 2020      Gaithersburg High School         132      1655    7.98
##  2 2020      Montgomery Blair High School     105      1655    6.34
##  3 2020      Northwest HS - Germantown         92      1655    5.56
##  4 2020      Paint Branch High School          92      1655    5.56
##  5 2020      Springbrook Sr High School        91      1655    5.50
##  6 2020      Wheaton High School               80      1655    4.83
##  7 2020      Clarksburg High School            76      1655    4.59
##  8 2020      Richard Montgomery High School    76      1655    4.59
##  9 2020      Colonel Zadok Magruder HS         75      1655    4.53
## 10 2020      Albert Einstein HS & MC Art Cn    74      1655    4.47
## # … with 15 more rows

Breakdown of Highschools Full time students in term year 2021 attended in MCPS

df_MCPS20D%>%
          filter(full_part=="FT" & term_year=="2021")%>%
          group_by(term_year,high_school)%>%
          count(high_school)%>%
          group_by(term_year)%>%
          mutate(total_pop =sum(n))%>%
          group_by(high_school)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_pop= (n/total_pop*100))%>%
          arrange(desc(pct_pop))
## # A tibble: 25 x 5
## # Groups:   high_school [25]
##    term_year high_school                        n total_pop pct_pop
##    <chr>     <chr>                          <int>     <int>   <dbl>
##  1 2021      Montgomery Blair High School      97      1556    6.23
##  2 2021      Paint Branch High School          91      1556    5.85
##  3 2021      Wheaton High School               90      1556    5.78
##  4 2021      Gaithersburg High School          89      1556    5.72
##  5 2021      Northwest HS - Germantown         86      1556    5.53
##  6 2021      Colonel Zadok Magruder HS         84      1556    5.40
##  7 2021      Richard Montgomery High School    78      1556    5.01
##  8 2021      Watkins Mill High School          75      1556    4.82
##  9 2021      Clarksburg High School            74      1556    4.76
## 10 2021      James Hubert Blake High School    70      1556    4.50
## # … with 15 more rows
# calculate percentage change in full time student enrollment from 2020 to 2021 by MCPS highschool
df_MCPS20D%>%
          filter(full_part=="FT")%>%
          group_by(term_year,high_school)%>%
          count(high_school)%>%
          group_by(term_year)%>%
          group_by(high_school)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100)%>%
          arrange(desc(pct_change))
## # A tibble: 50 x 4
## # Groups:   high_school [25]
##    term_year high_school                        n pct_change
##    <chr>     <chr>                          <int>      <dbl>
##  1 2021      Rockville High School             65      41.3 
##  2 2021      Wheaton High School               90      12.5 
##  3 2021      Colonel Zadok Magruder HS         84      12   
##  4 2021      Walt Whitman High School          20      11.1 
##  5 2021      Seneca Valley High School         55      10   
##  6 2021      Sherwood High School              68       9.68
##  7 2021      Bethesda Chevy Chase High Schl    43       4.88
##  8 2021      Watkins Mill High School          75       4.17
##  9 2021      Richard Montgomery High School    78       2.63
## 10 2021      Thomas Sprigg Wootton High Sch    33       0   
## # … with 40 more rows
v1<- df_MCPS20D %>% 
    group_by(term_year,full_part) %>% 
    filter(full_part=="FT" & term_year=="2020")%>%
    count(high_school) %>% 
    mutate(prop = n/sum(n)) %>% 
    ggplot(aes(x = high_school, y = prop)) +
    geom_col(aes(fill=high_school), position = "dodge") +
    geom_text(aes(label = scales::percent(prop,0.5), 
                  y = prop, 
                  group = high_school),
              position = position_dodge(width = 0.9),
              vjust = 0, size=3, hjust=0)+
  #  facet_wrap(~term_year )+
      ggtitle("High schools full time students graduated in term year 2020 graduated")+
      ylab('Proportion ')+
      xlab("")+
      theme(legend.position = "none", axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank()) 
  
v1+ coord_flip()  

v1<- df_MCPS20D %>% 
    group_by(term_year,full_part) %>% 
    filter(full_part=="FT" & term_year=="2021")%>%
    count(high_school) %>% 
    mutate(prop = n/sum(n)) %>% 
    ggplot(aes(x = high_school, y = prop)) +
    geom_col(aes(fill=high_school), position = "dodge") +
    geom_text(aes(label = scales::percent(prop,0.5), 
                  y = prop, 
                  group = high_school),
              position = position_dodge(width = 0.9),
              vjust = 0, size=3, hjust=0)+
  #  facet_wrap(~term_year )+
      ggtitle("High schools full time students graduated in term year 2021 graduated")+
      ylab('Proportion ')+
      xlab("")+
      theme(legend.position = "none", axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank()) 
  
v1+ coord_flip()  

4.6.2 Part time Student

Breakdown of Highschools Part time students in term year 2020 attended in MCPS

df_MCPS20D%>%
          filter(full_part=="PT" & term_year=="2020")%>%
          group_by(term_year,high_school)%>%
          count(high_school)%>%
          group_by(term_year)%>%
          mutate(total_pop =sum(n))%>%
          group_by(high_school)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_pop= (n/total_pop*100))%>%
          arrange(desc(pct_pop))
## # A tibble: 25 x 5
## # Groups:   high_school [25]
##    term_year high_school                        n total_pop pct_pop
##    <chr>     <chr>                          <int>     <int>   <dbl>
##  1 2020      Northwest HS - Germantown         61       801    7.62
##  2 2020      John F. Kennedy High School       56       801    6.99
##  3 2020      Gaithersburg High School          55       801    6.87
##  4 2020      Montgomery Blair High School      54       801    6.74
##  5 2020      Albert Einstein HS & MC Art Cn    44       801    5.49
##  6 2020      Clarksburg High School            43       801    5.37
##  7 2020      Paint Branch High School          41       801    5.12
##  8 2020      Richard Montgomery High School    38       801    4.74
##  9 2020      Watkins Mill High School          38       801    4.74
## 10 2020      Rockville High School             37       801    4.62
## # … with 15 more rows

Breakdown of Highschools Part time students in term year 2021 attended in MCPS

df_MCPS20D%>%
          filter(full_part=="PT" & term_year=="2021")%>%
          group_by(term_year,high_school)%>%
          count(high_school)%>%
          group_by(term_year)%>%
          mutate(total_pop =sum(n))%>%
          group_by(high_school)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_pop= (n/total_pop*100))%>%
          arrange(desc(pct_pop))
## # A tibble: 25 x 5
## # Groups:   high_school [25]
##    term_year high_school                        n total_pop pct_pop
##    <chr>     <chr>                          <int>     <int>   <dbl>
##  1 2021      Gaithersburg High School          48       747    6.43
##  2 2021      Northwest HS - Germantown         48       747    6.43
##  3 2021      Montgomery Blair High School      39       747    5.22
##  4 2021      Colonel Zadok Magruder HS         38       747    5.09
##  5 2021      Northwood High School             38       747    5.09
##  6 2021      Paint Branch High School          37       747    4.95
##  7 2021      Quince Orchard Sr High School     35       747    4.69
##  8 2021      Walter Johnson High School        35       747    4.69
##  9 2021      Albert Einstein HS & MC Art Cn    34       747    4.55
## 10 2021      Richard Montgomery High School    33       747    4.42
## # … with 15 more rows
# calculate percentage change in full time student enrollment from 2020 to 2021 by MCPS highschool
df_MCPS20D%>%
          filter(full_part=="PT")%>%
          group_by(term_year,high_school)%>%
          count(high_school)%>%
          group_by(term_year)%>%
          group_by(high_school)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100)%>%
          arrange(desc(pct_change))
## # A tibble: 50 x 4
## # Groups:   high_school [25]
##    term_year high_school                        n pct_change
##    <chr>     <chr>                          <int>      <dbl>
##  1 2021      Thomas Sprigg Wootton High Sch    24     100   
##  2 2021      Walter Johnson High School        35      75   
##  3 2021      Winston Churchill High School     15      66.7 
##  4 2021      Poolesville Jr-Sr High School     14      55.6 
##  5 2021      Northwood High School             38      52   
##  6 2021      Sherwood High School              30      36.4 
##  7 2021      Walt Whitman High School          14      27.3 
##  8 2021      Colonel Zadok Magruder HS         38      15.2 
##  9 2021      James Hubert Blake High School    30      11.1 
## 10 2021      Damascus High School              19       5.56
## # … with 40 more rows
v3<- df_MCPS20D %>% 
    group_by(term_year,full_part) %>% 
    filter(full_part=="PT" & term_year=="2020")%>%
    count(high_school) %>% 
    mutate(prop = n/sum(n)) %>% 
    ggplot(aes(x = high_school, y = prop)) +
    geom_col(aes(fill=high_school), position = "dodge") +
    geom_text(aes(label = scales::percent(prop,0.5), 
                  y = prop, 
                  group = high_school),
              position = position_dodge(width = 0.9),
              vjust = 0, size=3, hjust=0)+
  #  facet_wrap(~term_year )+
      ggtitle("High schools Part time students graduated in term year 2020 graduated")+
      ylab('Proportion ')+
      xlab("")+
      theme(legend.position = "none", axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank()) 
  
v3+ coord_flip()  

v4<- df_MCPS20D %>% 
    group_by(term_year,full_part) %>% 
    filter(full_part=="PT" & term_year=="2021")%>%
    count(high_school) %>% 
    mutate(prop = n/sum(n)) %>% 
    ggplot(aes(x = high_school, y = prop)) +
    geom_col(aes(fill=high_school), position = "dodge") +
    geom_text(aes(label = scales::percent(prop,0.5), 
                  y = prop, 
                  group = high_school),
              position = position_dodge(width = 0.9),
              vjust = 0, size=3, hjust=0)+
  #  facet_wrap(~term_year )+
      ggtitle("High schools Part time students graduated in term year 2021 graduated")+
      ylab('Proportion ')+
      xlab("")+
      theme(legend.position = "none", axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank()) 
  
v4 + coord_flip()  

5 Hours Attempted

Boxplots of hours_attempted by year by MCPS students 20yrs and younger

p11 = ggplot(df_MCPS20D, aes(hours_attempted))
p11 + geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~full_part)

Students who register for more than 18 credits require special permission from the department. Further more a full time student is classified as someone who is enrolled in 12 or more credits. A part time student is classified as someone who is enrolled in less than 12 credits. However based on thge dataset, a number of full time students attempt less than 12 credits and large a number of part time students attempt more than 12 hours.

Boxplots of hours_attempted by year by Full time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(hours_attempted))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

Boxplots of hours_attempted by year by Part time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(hours_attempted))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

There are not many outliers in the part time student groups. Term year 2021 seems to have more outliers on the upper end.

Density plot of hours_attempted by year

ggplot(df_MCPS20D, aes(hours_attempted, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~full_part)+
  xlab("Hours attempted") +
  ylab( "Density")+
   ggtitle(" Hours Attempted by Full-time Students vs Part-time Students")

Hours attempted by full time students

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(hours_attempted, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("Hours attempted") +
  ylab( "Density") +
  ggtitle(" Hours Attempted by Full-time Students")

Fivenum Summary of Full time students

df_MCPS20D%>% filter(full_part=="FT")%>%
  group_by(race,term_year)%>%
  summarise(n = n(),
            min = fivenum(hours_attempted)[1],
            Q1 = fivenum(hours_attempted)[2],
            median = fivenum(hours_attempted)[3],
            Q3 = fivenum(hours_attempted)[4],
            max = fivenum(hours_attempted)[5],
            mean= mean(hours_attempted),
            sd = sd(hours_attempted))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups:   race [9]
##    race               term_year     n   min    Q1 median    Q3   max  mean    sd
##    <chr>              <chr>     <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 Am. Indian / AK N… 2020          5     6  13       17  19      36  18.2 11.1 
##  2 Am. Indian / AK N… 2021          1    13  13       13  13      13  13   NA   
##  3 Asian              2020        272     6  13       15  20      52  17.7  8.13
##  4 Asian              2021        227     7  13       15  17      46  16.8  7.00
##  5 Black / African A… 2020        389     5  12       13  14      42  13.7  3.53
##  6 Black / African A… 2021        326     4  12       14  16      38  14.9  4.31
##  7 Foreign            2020        103     7  12       14  17      31  14.8  4.26
##  8 Foreign            2021         96     7  12.5     15  16      37  15.7  5.29
##  9 Hawaiian / Pac. I… 2020          5     9  12       13  13      15  12.4  2.19
## 10 Hawaiian / Pac. I… 2021          3    12  15.5     19  24.5    30  20.3  9.07
## 11 Hispanic           2020        534     4  12       13  15      39  14.2  4.43
## 12 Hispanic           2021        596     3  12       14  16      43  15.0  4.33
## 13 Multi-Race         2020         71     6  12       13  17      44  16.7  8.04
## 14 Multi-Race         2021         63     6  12       14  16.5    43  15.7  6.22
## 15 Unknown            2020         11     9  12       14  15      31  15    5.78
## 16 Unknown            2021          3    12  12       12  13      14  12.7  1.15
## 17 White              2020        265     8  12       13  16      46  15.9  7.08
## 18 White              2021        241     7  13       14  17      54  16.5  6.37

Hours attempted by part time students

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(hours_attempted, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("Hours attempted") +
  ylab( "Density")+
   ggtitle(" Hours Attempted by Part-time Students")

Fivenum Summary of Part time students

df_MCPS20D%>% filter(full_part=="PT")%>%
  group_by(race,term_year)%>%
  summarise(n = n(),
            min = fivenum(hours_attempted)[1],
            Q1 = fivenum(hours_attempted)[2],
            median = fivenum(hours_attempted)[3],
            Q3 = fivenum(hours_attempted)[4],
            max = fivenum(hours_attempted)[5],
            mean= mean(hours_attempted),
            sd = sd(hours_attempted))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups:   race [9]
##    race               term_year     n   min    Q1 median    Q3   max  mean    sd
##    <chr>              <chr>     <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 Am. Indian / AK N… 2020          4     3   3      5.5   8.5     9  5.75  3.20
##  2 Am. Indian / AK N… 2021          1     6   6      6     6       6  6    NA   
##  3 Asian              2020         69     2   6      9    10      33  8.62  4.94
##  4 Asian              2021         63     3   7.5    9    11      21  8.90  3.64
##  5 Black / African A… 2020        177     1   6      7     9      15  7.28  2.62
##  6 Black / African A… 2021        181     1   6      8    10      25  7.80  3.37
##  7 Foreign            2020         73     3   6      8    10      23  8.18  3.89
##  8 Foreign            2021         54     3   5      9    10      29  8.61  4.38
##  9 Hawaiian / Pac. I… 2020          1     6   6      6     6       6  6    NA   
## 10 Hawaiian / Pac. I… 2021          1     5   5      5     5       5  5    NA   
## 11 Hispanic           2020        327     1   6      8     9      21  7.84  3.06
## 12 Hispanic           2021        263     1   6      8    11      42  8.73  4.41
## 13 Multi-Race         2020         33     1   4      8     9      12  7.03  2.98
## 14 Multi-Race         2021         35     3   6      9    10      26  8.34  3.90
## 15 Unknown            2020          5     7   9     10    10      10  9.2   1.30
## 16 Unknown            2021          2     4   4      6.5   9       9  6.5   3.54
## 17 White              2020        112     1   6      8    10      33  8.15  4.47
## 18 White              2021        147     3   5      8    10      39  8.43  5.02

6 Hours Earned

Boxplots of Hours Earned by year by MCPS students 20yrs and younger

p11 = ggplot(df_MCPS20D, aes(hours_earned))
p11 + geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~full_part)

Boxplots of hours_earned by year by Full time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(hours_earned))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

Boxplots of hours_earned by year by Part time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(hours_earned))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

There are not many outliers in the part time student groups. Term year 2021 seems to have more outliers on the upper end.

Density plot of hours_earned by year

ggplot(df_MCPS20D, aes(hours_earned, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~full_part)+
  xlab("Hours Earned") +
  ylab( "Density")+
  ggtitle(" Hours Earned by Full-time vs Part-time Students")

Hours_earned by full time students

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(hours_earned, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("Hours Earned") +
  ylab( "Density")+
   ggtitle(" Hours Earned by Full-time Students")

Fivenum Summary of Full time students

df_MCPS20D%>% filter(full_part=="FT")%>%
  group_by(race,term_year)%>%
  summarise(n = n(),
            min = fivenum(hours_earned)[1],
            Q1 = fivenum(hours_earned)[2],
            median = fivenum(hours_earned)[3],
            Q3 = fivenum(hours_earned)[4],
            max = fivenum(hours_earned)[5],
            mean= mean(hours_earned),
            sd = sd(hours_earned))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups:   race [9]
##    race               term_year     n   min    Q1 median    Q3   max  mean    sd
##    <chr>              <chr>     <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 Am. Indian / AK N… 2020          5     0  10       14  19      36 15.8  13.3 
##  2 Am. Indian / AK N… 2021          1    13  13       13  13      13 13    NA   
##  3 Asian              2020        272     0   9       13  17      52 14.7   9.15
##  4 Asian              2021        227     0   9       12  16      46 13.4   8.53
##  5 Black / African A… 2020        389     0   6        9  12      42  8.85  5.58
##  6 Black / African A… 2021        326     0   6        9  13      37  9.55  6.53
##  7 Foreign            2020        103     0   6        9  13      31 10.4   6.44
##  8 Foreign            2021         96     0   6       10  13      37 10.6   7.56
##  9 Hawaiian / Pac. I… 2020          5     0   0        9  12      13  6.8   6.38
## 10 Hawaiian / Pac. I… 2021          3     9  12.5     16  23      30 18.3  10.7 
## 11 Hispanic           2020        534     0   6        9  12      38  9.57  6.50
## 12 Hispanic           2021        596     0   6       10  13      33  9.73  6.36
## 13 Multi-Race         2020         71     0   7       12  15      44 13.4   9.76
## 14 Multi-Race         2021         63     0   6       10  13.5    43 10.8   8.69
## 15 Unknown            2020         11     3   5        9  13      31 10.5   7.90
## 16 Unknown            2021          3     3   5        7   9.5    12  7.33  4.51
## 17 White              2020        265     0   7       11  15      46 12.3   8.88
## 18 White              2021        241     0   7       12  15      54 12.5   8.22

hours_earned by part time students

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(hours_earned, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("Hours Earned") +
  ylab( "Density")+
   ggtitle(" Hours Earned by Part-time Students")

Fivenum Summary of Part time students

df_MCPS20D%>% filter(full_part=="PT")%>%
  group_by(race,term_year)%>%
  summarise(n = n(),
            min = fivenum(hours_earned)[1],
            Q1 = fivenum(hours_earned)[2],
            median = fivenum(hours_earned)[3],
            Q3 = fivenum(hours_earned)[4],
            max = fivenum(hours_earned)[5],
            mean= mean(hours_earned),
            sd = sd(hours_earned))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups:   race [9]
##    race              term_year     n   min    Q1 median    Q3   max  mean     sd
##    <chr>             <chr>     <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
##  1 Am. Indian / AK … 2020          4     0   1.5    3       3     3  2.25  1.5  
##  2 Am. Indian / AK … 2021          1     3   3      3       3     3  3    NA    
##  3 Asian             2020         69     0   0      4       6    33  4.81  5.50 
##  4 Asian             2021         63     0   3      3       6    21  4.73  4.48 
##  5 Black / African … 2020        177     0   0      3       4    11  2.73  2.82 
##  6 Black / African … 2021        181     0   0      1       6    22  2.81  3.72 
##  7 Foreign           2020         73     0   0      3       6    21  3.96  4.60 
##  8 Foreign           2021         54     0   0      0       6    29  3.09  5.04 
##  9 Hawaiian / Pac. … 2020          1     0   0      0       0     0  0    NA    
## 10 Hawaiian / Pac. … 2021          1     3   3      3       3     3  3    NA    
## 11 Hispanic          2020        327     0   0      3       6    21  3.48  3.84 
## 12 Hispanic          2021        263     0   0      3       6    42  4.65  5.14 
## 13 Multi-Race        2020         33     0   1      3       9    11  4.27  3.83 
## 14 Multi-Race        2021         35     0   0      3       6    26  4.11  4.95 
## 15 Unknown           2020          5     0   1      1       4     9  3     3.67 
## 16 Unknown           2021          2     3   3      3.5     4     4  3.5   0.707
## 17 White             2020        112     0   0      4       7    27  4.74  4.87 
## 18 White             2021        147     0   3      4       7    33  5.16  5.14

7 GPA

Boxplots of GPA by year by MCPS students 20yrs and younger

p11 = ggplot(df_MCPS20D, aes(mc_gpa))
p11 + geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~full_part)

Boxplots of GPA by year by Full time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(mc_gpa))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

Boxplots of GPA by year by Part time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(mc_gpa))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

Density plot of GPA by year

ggplot(df_MCPS20D, aes(mc_gpa, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~full_part)+
  xlab("GPA") +
  ylab( "Density")+
  ggtitle(" GPA by Full-time vs Part-time Students")

GPA by full time students

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(mc_gpa, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("GPA") +
  ylab( "Density")+
   ggtitle(" GPA of Full-time Students")

Fivenum Summary of Full time students

df_MCPS20D%>% filter(full_part=="FT")%>%
  group_by(race,term_year)%>%
  summarise(n = n(),
            min = fivenum(mc_gpa)[1],
            Q1 = fivenum(mc_gpa)[2],
            median = fivenum(mc_gpa)[3],
            Q3 = fivenum(mc_gpa)[4],
            max = fivenum(mc_gpa)[5],
            mean= mean(mc_gpa),
            sd = sd(mc_gpa))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups:   race [9]
##    race              term_year     n   min    Q1 median    Q3   max  mean     sd
##    <chr>             <chr>     <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
##  1 Am. Indian / AK … 2020          5  0     2.35   2.9   3.5   4     2.55  1.55 
##  2 Am. Indian / AK … 2021          1  2.77  2.77   2.77  2.77  2.77  2.77 NA    
##  3 Asian             2020        272  0     2.33   3.3   3.73  4     2.93  1.03 
##  4 Asian             2021        227  0     2.5    3.23  3.71  4     2.88  1.12 
##  5 Black / African … 2020        389  0     1.5    2.5   3.14  4     2.25  1.18 
##  6 Black / African … 2021        326  0     1.33   2.67  3.4   4     2.31  1.30 
##  7 Foreign           2020        103  0     2      3     3.65  4     2.71  1.20 
##  8 Foreign           2021         96  0     1.46   2.82  3.69  4     2.48  1.35 
##  9 Hawaiian / Pac. … 2020          5  0     0      2.25  2.67  3.77  1.74  1.68 
## 10 Hawaiian / Pac. … 2021          3  1.75  2.22   2.68  3.34  4     2.81  1.13 
## 11 Hispanic          2020        534  0     1.5    2.70  3.44  4     2.38  1.25 
## 12 Hispanic          2021        596  0     1.23   2.66  3.33  4     2.29  1.30 
## 13 Multi-Race        2020         71  0     2      2.75  3.5   4     2.59  1.13 
## 14 Multi-Race        2021         63  0     1.5    2.6   3.54  4     2.37  1.35 
## 15 Unknown           2020         11  0.33  2.12   2.33  3.32  4     2.55  1.00 
## 16 Unknown           2021          3  2.55  2.65   2.75  3.38  4     3.1   0.786
## 17 White             2020        265  0     1.8    3     3.6   4     2.59  1.22 
## 18 White             2021        241  0     2      3     3.69  4     2.67  1.26

GPA of Part time students

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(mc_gpa, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("Hours Earned") +
  ylab( "Density")+
   ggtitle(" GPA of Part-time Students")

Fivenum Summary of Part time students

df_MCPS20D%>% filter(full_part=="PT")%>%
  group_by(race,term_year)%>%
  summarise(n = n(),
            min = fivenum(mc_gpa)[1],
            Q1 = fivenum(mc_gpa)[2],
            median = fivenum(mc_gpa)[3],
            Q3 = fivenum(mc_gpa)[4],
            max = fivenum(mc_gpa)[5],
            mean= mean(mc_gpa),
            sd = sd(mc_gpa))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups:   race [9]
##    race              term_year     n   min    Q1 median    Q3   max  mean     sd
##    <chr>             <chr>     <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
##  1 Am. Indian / AK … 2020          4     0  0.5    1.25  2.25     3  1.38  1.25 
##  2 Am. Indian / AK … 2021          1     2  2      2     2        2  2    NA    
##  3 Asian             2020         69     0  0      2.3   3.33     4  2.01  1.54 
##  4 Asian             2021         63     0  0.8    2     3.28     4  1.94  1.48 
##  5 Black / African … 2020        177     0  0      1.33  2.71     4  1.46  1.38 
##  6 Black / African … 2021        181     0  0      0.33  2.33     4  1.13  1.32 
##  7 Foreign           2020         73     0  0      2     3        4  1.65  1.51 
##  8 Foreign           2021         54     0  0      0     2.67     4  1.20  1.46 
##  9 Hawaiian / Pac. … 2020          1     0  0      0     0        0  0    NA    
## 10 Hawaiian / Pac. … 2021          1     4  4      4     4        4  4    NA    
## 11 Hispanic          2020        327     0  0      1.5   3        4  1.60  1.50 
## 12 Hispanic          2021        263     0  0      2     3        4  1.73  1.43 
## 13 Multi-Race        2020         33     0  0.67   2     3.5      4  1.99  1.51 
## 14 Multi-Race        2021         35     0  0      2.5   3        4  1.80  1.55 
## 15 Unknown           2020          5     0  0.75   2     3.67     4  2.08  1.75 
## 16 Unknown           2021          2     3  3      3.5   4        4  3.5   0.707
## 17 White             2020        112     0  0      2     3.33     4  1.86  1.54 
## 18 White             2021        147     0  0.55   2.5   3.33     4  2.16  1.49

# Hours Earned Rate

Density plot of Hours Earned Rate by year

ggplot(df_MCPS20D, aes(hours_earned_rate, fill = term_year)) + geom_density(alpha = 0.3) +
  facet_wrap(~full_part)+
  xlab("Hours Earned Rate") +
  ylab( "Density")+
  xlim(0,1)

Boxplots of Hours Earned Rate of Full time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(hours_earned_rate))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

Boxplots of Hours Earned Rate of Part time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(hours_earned_rate))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

Hours Earned Rate of full time students

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(hours_earned_rate, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("GPA") +
  ylab( "Density")+
   ggtitle(" Hours Earned Rate of Full-time Students")

Hours Earned Rate of part time students

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(hours_earned_rate, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("GPA") +
  ylab( "Density")+
   ggtitle(" Hours Earned Rate of Part-time Students")

8 Distribution of Variables and Correlation

Distribution of Variables and Correlation : Full time Students 2020

library(GGally)
# plot distributions and correlation of variables
df_MCPS20D%>% filter(term_year=="2020")%>%
              filter(full_part=="FT")%>%
              ggpairs(., columns = c("hours_attempted","hours_earned", "mc_gpa","hours_earned_rate"))

Distribution of Variables and Correlation : Full time Students 2021

library(GGally)
# plot distributions and correlation of variables
df_MCPS20D%>% filter(term_year=="2021")%>%
              filter(full_part=="FT")%>%
              ggpairs(., columns = c("hours_attempted","hours_earned", "mc_gpa","hours_earned_rate"))

Distribution of Variables and Correlation : Part time Students 2020

library(GGally)
# plot distributions and correlation of variables
df_MCPS20D%>% filter(term_year=="2020")%>%
              filter(full_part=="PT")%>%
              ggpairs(., columns = c("hours_attempted","hours_earned", "mc_gpa","hours_earned_rate"))

Distribution of Variables and Correlation : Part time Students 2021

library(GGally)
# plot distributions and correlation of variables
df_MCPS20D%>% filter(term_year=="2021")%>%
              filter(full_part=="PT")%>%
              ggpairs(., columns = c("hours_attempted","hours_earned", "mc_gpa","hours_earned_rate"))